5 research outputs found

    Efficient mixture model for clustering of sparse high dimensional binary data

    Get PDF
    Clustering is one of the fundamental tools for preliminary analysis of data. While most of the clustering methods are designed for continuous data, sparse high-dimensional binary representations became very popular in various domains such as text mining or cheminformatics. The application of classical clustering tools to this type of data usually proves to be very inefficient, both in terms of computational complexity as well as in terms of the utility of the results. In this paper we propose a mixture model, SparseMix, for clustering of sparse high dimensional binary data, which connects model-based with centroid-based clustering. Every group is described by a representative and a probability distribution modeling dispersion from this representative. In contrast to classical mixture models based on the EM algorithm, SparseMix: is specially designed for the processing of sparse data; can be efficiently realized by an on-line Hartigan optimization algorithm; describes every cluster by the most representative vector. We have performed extensive experimental studies on various types of data, which confirmed that SparseMix builds partitions with a higher compatibility with reference grouping than related methods. Moreover, constructed representatives often better reveal the internal structure of data

    Analliza statystyczna topologicznej struktury węzłów

    No full text
    Problem identyfikacji węzłów zawsze był w centrum uwagi teorii węzłów. Przez lata powstało wiele niezmienników do rozróżniania węzłów. Odkrycie grupy węzłowej i jej właściwości sprowadziło topologiczny problem homeomorficzności do algebraicznego problemu izomorfizmu I wiele niezmienników grup było używanych. Jeden z takich niezmienników który został niedawno wprowadzony przejawia obiecujące rezultaty. Jest on połączony z algorytmem liczenia prezentacji grupy węzłowej skupiającym się na jak największym jej uproszczeniu. Celem tej pracy jest przeanalizowanie skuteczności tego niezmiennika i porównanie go do istniejących niezmienników poprzez dane empiryczne.The problem of knot recognition has always been at the core of knot theory. Many invariants were created to differentiate the knots throughout the years. The introduction of the knot group and its properties has reduced the topological problem of homeomorphism to an algebraic one of isomorphism and many group invariants have been used. One such invariant has recently been given that shows promising results. It is accompanied with an algorithm of calculating the knot group presentation that aims to simplify it as much as possible. The purpose of this paper is to analyse the performance of this invariant and compare it to existing invariants through empirical data

    Split-and-merge Tweak in Cross Entropy Clustering

    No full text
    Part 3: Data Analysis and Information RetrievalInternational audienceIn order to solve the local convergence problem of the Cross Entropy Clustering algorithm, a split-and-merge operation is introduced to escape from local minima and reach a better solution. We describe the theoretical aspects of the method in a limited space, present a few strategies of tweaking the clustering algorithm and compare them with existing solutions. The experiments show that the presented approach increases flexibility and effectiveness of the whole algorithm

    The actions of research institutes to support adaptation to climate change

    No full text
    A crucial part of every adaptation planning and disaster risk reduction is estimation of vulnerable areas and risk in the future. Only a well-developed monitoring system could bring valuable information to create possible scenarios to set up adaptation plans. Monitoring systems of meteorological conditions, surface water, groundwater, landslides, seacoast, agricultural drought as well as their standards and methodologies, are crucial for establishing an effective warning system of every country, and thus are the subject of research conducted by national institutes. Therefore, the conditions of this national research (getting trained staff, equipment etc.) is essential to provide reliable information for a national adaptation plan and for economic assessment of climate change impacts. Poland has significant experiences in monitoring systems, data collecting and visualizing, as well as in the development of scenarios and risk maps. Methodologies and capacity building, necessary for their use, along with experiences and lessons, learned to get valuable information for disaster risk reduction, were presented by the authors from the research during the 24th session of the Conference of the Parties to the United Nations Framework Convention on Climate Change (COP 24) in Katowice (December 2018). The presentation contributed to the global adaptation process through experience sharing that is important for the relevant research conducted in the least developed countries
    corecore